1 Covariate Shift by Kernel Mean Matching
نویسندگان
چکیده
Given sets of observations of training and test data, we consider the problem of re-weighting the training data such that its distribution more closely matches that of the test data. We achieve this goal by matching covariate distributions between training and test sets in a high dimensional feature space (specifically, a reproducing kernel Hilbert space). This approach does not require distribution estimation. Instead, the sample weights are obtained by a simple quadratic programming procedure. We provide a uniform convergence bound on the distance between the reweighted training feature mean and the test feature mean, a transductive bound on the expected loss of an algorithm trained on the reweighted data, and a connection to single class SVMs. While our method is designed to deal with the case of simple covariate shift (in the sense of Chapter ??), we have also found benefits for sample selection bias on the labels. Our correction procedure yields its greatest and most consistent advantages when the learning algorithm returns a classifier/regressor that is “simpler” than the data might suggest.
منابع مشابه
Analysis of Kernel Mean Matching under Covariate Shift
In real supervised learning scenarios, it is not uncommon that the training and test sample follow different probability distributions, thus rendering the necessity to correct the sampling bias. Focusing on a particular covariate shift problem, we derive high probability confidence bounds for the kernel mean matching (KMM) estimator, whose convergence rate turns out to depend on some regularity...
متن کاملCorrecting Covariate Shift with the Frank-Wolfe Algorithm
Covariate shift is a fundamental problem for learning in non-stationary environments where the conditional distribution ppy|xq is the same between training and test data while their marginal distributions ptrpxq and ptepxq are different. Although many covariate shift correction techniques remain effective for real world problems, most do not scale well in practice. In this paper, using inspirat...
متن کاملDiscriminative Learning Under Covariate Shift
We address classification problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution—problems also referred to as classification under covariate shift. We derive a solution that is purely discriminative: neither training nor test distribution are modeled explicitly. The problem of learning under covariate shif...
متن کاملLinear-Time Estimators for Propensity Scores
We present linear-time estimators for three popular covariate shift correction and propensity scoring algorithms: logistic regression(LR), kernel mean matching(KMM) [19], and maximum entropy mean matching(MEMM)[20]. This allows applications in situations where both treatment and control groups are large. We also show that the last two algorithms differ only in their choice of regularizer (l2 of...
متن کاملKernel Robust Bias-Aware Prediction under Covariate Shift
Under covariate shift, training (source) data and testing (target) data differ in input space distribution, but share the same conditional label distribution. This poses a challenging machine learning task. Robust Bias-Aware (RBA) prediction provides the conditional label distribution that is robust to the worstcase logarithmic loss for the target distribution while matching feature expectation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008